22 research outputs found
Designing Reusable Systems that Can Handle Change - Description-Driven Systems : Revisiting Object-Oriented Principles
In the age of the Cloud and so-called Big Data systems must be increasingly
flexible, reconfigurable and adaptable to change in addition to being developed
rapidly. As a consequence, designing systems to cater for evolution is becoming
critical to their success. To be able to cope with change, systems must have
the capability of reuse and the ability to adapt as and when necessary to
changes in requirements. Allowing systems to be self-describing is one way to
facilitate this. To address the issues of reuse in designing evolvable systems,
this paper proposes a so-called description-driven approach to systems design.
This approach enables new versions of data structures and processes to be
created alongside the old, thereby providing a history of changes to the
underlying data models and enabling the capture of provenance data. The
efficacy of the description-driven approach is exemplified by the CRISTAL
project. CRISTAL is based on description-driven design principles; it uses
versions of stored descriptions to define various versions of data which can be
stored in diverse forms. This paper discusses the need for capturing holistic
system description when modelling large-scale distributed systems.Comment: 8 pages, 1 figure and 1 table. Accepted by the 9th Int Conf on the
Evaluation of Novel Approaches to Software Engineering (ENASE'14). Lisbon,
Portugal. April 201
A Description Driven Approach for Flexible Metadata Tracking
Evolving user requirements presents a considerable software engineering
challenge, all the more so in an environment where data will be stored for a
very long time, and must remain usable as the system specification evolves
around it. Capturing the description of the system addresses this issue since a
description-driven approach enables new versions of data structures and
processes to be created alongside the old, thereby providing a history of
changes to the underlying data models and enabling the capture of provenance
data. This description-driven approach is advocated in this paper in which a
system called CRISTAL is presented. CRISTAL is based on description-driven
principles; it can use previous versions of stored descriptions to define
various versions of data which can be stored in various forms. To demonstrate
the efficacy of this approach the history of the project at CERN is presented
where CRISTAL was used to track data and process definitions and their
associated provenance data in the construction of the CMS ECAL detector, how it
was applied to handle analysis tracking and data index provenance in the
neuGRID and N4U projects, and how it will be matured further in the CRISTAL-ISE
project. We believe that the CRISTAL approach could be invaluable in handling
the evolution, indexing and tracking of large datasets, and are keen to apply
it further in this direction.Comment: 10 pages and 3 figures. arXiv admin note: text overlap with
arXiv:1402.5753, arXiv:1402.576
Towards Provenance and Traceability in CRISTAL for HEP
This paper discusses the CRISTAL object lifecycle management system and its
use in provenance data management and the traceability of system events. This
software was initially used to capture the construction and calibration of the
CMS ECAL detector at CERN for later use by physicists in their data analysis.
Some further uses of CRISTAL in different projects (CMS, neuGRID and N4U) are
presented as examples of its flexible data model. From these examples,
applications are drawn for the High Energy Physics domain and some initial
ideas for its use in data preservation HEP are outlined in detail in this
paper. Currently investigations are underway to gauge the feasibility of using
the N4U Analysis Service or a derivative of it to address the requirements of
data and analysis logging and provenance capture within the HEP long term data
analysis environment.Comment: 5 pages and 1 figure. 20th International Conference on Computing in
High Energy and Nuclear Physics (CHEP13). 14-18th October 2013. Amsterdam,
Netherlands. To appear in Journal of Physics Conference Serie
Scientific Workflow Repeatability through Cloud-Aware Provenance
The transformations, analyses and interpretations of data in scientific
workflows are vital for the repeatability and reliability of scientific
workflows. This provenance of scientific workflows has been effectively carried
out in Grid based scientific workflow systems. However, recent adoption of
Cloud-based scientific workflows present an opportunity to investigate the
suitability of existing approaches or propose new approaches to collect
provenance information from the Cloud and to utilize it for workflow
repeatability in the Cloud infrastructure. The dynamic nature of the Cloud in
comparison to the Grid makes it difficult because resources are provisioned
on-demand unlike the Grid. This paper presents a novel approach that can assist
in mitigating this challenge. This approach can collect Cloud infrastructure
information along with workflow provenance and can establish a mapping between
them. This mapping is later used to re-provision resources on the Cloud. The
repeatability of the workflow execution is performed by: (a) capturing the
Cloud infrastructure information (virtual machine configuration) along with the
workflow provenance, and (b) re-provisioning the similar resources on the Cloud
and re-executing the workflow on them. The evaluation of an initial prototype
suggests that the proposed approach is feasible and can be investigated
further.Comment: 6 pages; 5 figures; 3 tables in Proceedings of the Recomputability
2014 workshop of the 7th IEEE/ACM International Conference on Utility and
Cloud Computing (UCC 2014). London December 201
Designing Traceability into Big Data Systems
Providing an appropriate level of accessibility and traceability to data or
process elements (so-called Items) in large volumes of data, often
Cloud-resident, is an essential requirement in the Big Data era.
Enterprise-wide data systems need to be designed from the outset to support
usage of such Items across the spectrum of business use rather than from any
specific application view. The design philosophy advocated in this paper is to
drive the design process using a so-called description-driven approach which
enriches models with meta-data and description and focuses the design process
on Item re-use, thereby promoting traceability. Details are given of the
description-driven design of big data systems at CERN, in health informatics
and in business process management. Evidence is presented that the approach
leads to design simplicity and consequent ease of management thanks to loose
typing and the adoption of a unified approach to Item management and usage.Comment: 10 pages; 6 figures in Proceedings of the 5th Annual International
Conference on ICT: Big Data, Cloud and Security (ICT-BDCS 2015), Singapore
July 2015. arXiv admin note: text overlap with arXiv:1402.5764,
arXiv:1402.575
The Requirements for Ontologies in Medical Data Integration: A Case Study
Evidence-based medicine is critically dependent on three sources of
information: a medical knowledge base, the patients medical record and
knowledge of available resources, including where appropriate, clinical
protocols. Patient data is often scattered in a variety of databases and may,
in a distributed model, be held across several disparate repositories.
Consequently addressing the needs of an evidence-based medicine community
presents issues of biomedical data integration, clinical interpretation and
knowledge management. This paper outlines how the Health-e-Child project has
approached the challenge of requirements specification for (bio-) medical data
integration, from the level of cellular data, through disease to that of
patient and population. The approach is illuminated through the requirements
elicitation and analysis of Juvenile Idiopathic Arthritis (JIA), one of three
diseases being studied in the EC-funded Health-e-Child project.Comment: 6 pages, 1 figure. Presented at the 11th International Database
Engineering & Applications Symposium (Ideas2007). Banff, Canada September
200
Analysis traceability and provenance for HEP
This paper presents the use of the CRISTAL software in the N4U project. CRISTAL was used to create a set of provenance aware analysis tools for the Neuroscience domain. This paper advocates that the approach taken in N4U to build the analysis suite is sufficiently generic to be able to be applied to the HEP domain. A mapping to the PROV model for provenance interoperability is also presented and how this can be applied to the HEP domain for the interoperability of HEP analyses
Provision of an integrated data analysis platform for computational neuroscience experiments
© Emerald Group Publishing Limited. Purpose – The purpose of this paper is to provide an integrated analysis base to facilitate computational neuroscience experiments, following a user-led approach to provide access to the integrated neuroscience data and to enable the analyses demanded by the biomedical research community. Design/methodology/approach – The design and development of the N4U analysis base and related information services addresses the existing research and practical challenges by offering an integrated medical data analysis environment with the necessary building blocks for neuroscientists to optimally exploit neuroscience workflows, large image data sets and algorithms to conduct analyses. Findings – The provision of an integrated e-science environment of computational neuroimaging can enhance the prospects, speed and utility of the data analysis process for neurodegenerative diseases. Originality/value – The N4U analysis base enables conducting biomedical data analyses by indexing and interlinking the neuroimaging and clinical study data sets stored on the grid infrastructure, algorithms and scientific workflow definitions along with their associated provenance information